Job Talk
https://lilykoff.github.io/job_talk2025
Outline
Outline
Outline
Outline
Digital fingerprinting with accelerometry data
Digital fingerprinting with accelerometry data
Digital fingerprinting with accelerometry data
Big picture method: time series to scalar predictors
Each row in X is a second of data
Fit n regression models (one vs. rest)
Fit n regression models (one vs. rest)
Fit n regression models (one vs. rest)
Details of the method
For each second and each person:
Obtain joint distribution of acceleration and lag acceleration for a series of lags
Either:
- Obtain summaries of the joint distribution
- Use full joint distribution directly in functional regression
We walk through the process for one second, person, and lag to illustrate the process
Obtain joint distribution of acceleration and lag acceleration
Obtain joint distribution of acceleration and lag acceleration
Obtain joint distribution of acceleration and lag acceleration
Obtain joint distribution of acceleration and lag acceleration
Obtain joint distribution of acceleration and lag acceleration

Derive predictors from joint distribution
Derive predictors from joint distribution
Derive predictors from joint distribution
Derive predictors from joint distribution
Derive predictors from joint distribution
Repeat for multiple lags
Repeat for all seconds
Repeat for all people
Fit models
Fit models
- \(n\) models, one for each person
- Model \(j\) predicts probability that second \(i\) is from person \(j\)
- Max prediction across all models is the predicted person for that second
- Models include: logistic regression w/ variable selection, lasso, random forest, XGBoost, etc.
Aside: functional regression approach
Aside: functional regression approach
Aside: functional regression approach
Aside: functional regression approach
Aside: functional regression approach
![]()
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
\(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)
\(F(\cdot, \cdot, \cdot)\): trivariate smooth function
“Fingerprints” summarize predictors for a given lag and are different across individuals
“Fingerprints” summarize predictors for a given lag and are different across individuals
The method works!
- Applied to three datasets
- \(30\) people, \(6\) min of walking each, outdoors
- \(153\) people, \(2\) min of walking each, indoors
- Repeated sessions \(1\) week to \(6\) months apart
- \(14,000\) people who wore accelerometer for \(7\) days
- Used segmentation algorithm to ID walking
- Then used \(3\) min of data from each person
- Oversampling + weighting w/ logistic regression to overcome class imbalance
- Two train/test paradigms
- Random: seconds from all people randomly assigned to train/test
- Temporal: some days/sessions assigned to train, other days to test
Train/test paradigms, visualized
So what?
Detour through NHANES: open source step counting
- National Health and Nutrition Examination Survey (NHANES)
![]()
- Nationally representative survey of US
- Free-living accelerometry
- Physical activity summaries: not interpretable or translatable
- Steps: easy to understand measure of physical activity
- Can we accurately count steps from free-living accelerometry?
Open source step counting
Step estimates vary greatly between algorithms
Open source step counting
But all algorithms estimate decline with age
Open source step counting
Being in higher step quartile associated with lower adjusted mortality risk
Detour through NHANES: survey-weighted functional regression
Question motivated by NHANES: how are physical activity patterns associated with covariates like age, sex?
Survey-weighted functional regression
We can answer this question with function on scalar regression (FoSR):
Implementation: fast univariate inference (FUI)
\[\mathbb{E}[\mathrm{MIMS}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]
Survey-weighted functional regression
But: NHANES is not a simple random sample
Are our estimates valid for population-level inference?
Survey-weighted functional regression
\(\texttt{svyfosr}\): first survey-weighted functional regression implementation in R
Digital fingerprinting with hemodynamics data
Preliminary hemodynamics work
Coarse literature targets for MAP and CVP
Preliminary hemodynamics work
MAP and CVP are not independent; they occur simultaneously
Preliminary hemodynamics work
MAP and CVP are not independent; they occur simultaneously
Preliminary hemodynamics work
MAP and CVP are not independent; they occur simultaneously
Preliminary hemodynamics work
MAP and CVP are not independent; they occur simultaneously
- Fit XGBoost model on 727 patients
- Mean (SD) \(7 (1.8)\) minutes per patient, range \(3\)-\(16\) minutes
- Obtain predictors for many different lags and cut points
- Use predictors that are top 10 contributors to first 30 PCs (\(\approx 100\) predictors)
Future Directions
- Using changes in fingerprint (both walking and waveform) to predict changes in function
- Designing real-time interventions based on hemodynamics patterns
- Extending survey FoSR to longitudinal outcomes
- Standardizing processing and analysis pipelines for wearable accelerometry
Thank you!
- Link to slides: [https://lilykoff.github.io/job_talk2025](https://lilykoff.github.io/job_talk2025)
- Email: [lkoffma2@jh.edu](lkoffma2@jh.edu)
- Website: [lilykoff.com](lilykoff.com)
- Github: [github.com/lilykoff](github.com/lilykoff)